Skip to content

[ENH] Add participant+sessions.tsv for session-varying participant metadata#2403

Draft
yarikoptic wants to merge 1 commit into
masterfrom
enh-subject+sessions
Draft

[ENH] Add participant+sessions.tsv for session-varying participant metadata#2403
yarikoptic wants to merge 1 commit into
masterfrom
enh-subject+sessions

Conversation

@yarikoptic

@yarikoptic yarikoptic commented Apr 18, 2026

Copy link
Copy Markdown
Collaborator

Introduces a single new optional dataset-level file participant+sessions.tsv with a composite index [participant_id, session_id]. This provides a single top-level location for metadata that varies across both participants and sessions -- e.g. age at each visit, body weight, clinical scores in longitudinal studies -- complementing the existing participants.tsv (participant-constant) and per-subject *_sessions.tsv files.

Note that it is already possible to provide such metadata in sub-*/ses-*_sessions.tsv file. So such approach just serves the way to provide an "aggregate" collection of metadata. As such, we might then need to define how it interacts with the inheritance principle, but defining that yet TODO in general for .tsv files.

The + in the filename signals a composite index, following the convention proposed in

inspired by a work on BEP036 (@bids-standard/bep036):

Most of the changes are just straightforward interpolation of participants.tsv and sessions.tsv files definitions.

One of the notable changes is to meta/context.yaml where we added dataset.sessions (union of all session directories across subjects) to enable session-level validation checks. I think it is only reasonable given that we did already included dataset level summaries for datatypes and modalities. But it would require bids-validator to support it. Alternative - is to drop it and that extra check we added.

Ideally though we should figure out how to validate specific combinations of sub/sessions and TODO was left for that.

An example participant+sessions.tsv with body_weight column for the already 7t_trt bids-examples dataset is at

where, if you also look into original participants.tsv, makes it a little obvious that duplication of all entries across all sessions would be dubious.

  • I think overall we can state that it closes Age at session #1020 which theoretically could have been closed with original introduction of _sessions.tsv files.

TODOs

…tadata

Introduces a single new optional dataset-level file `participant+sessions.tsv`
with a composite index `[participant_id, session_id]`.  This provides a single
top-level location for metadata that varies across both participants and
sessions -- e.g. age at each visit, body weight, clinical scores in
longitudinal studies -- complementing the existing `participants.tsv`
(participant-constant) and per-subject `*_sessions.tsv` files.

Note that it is already possible to provide such metadata in
`sub-*/ses-*_sessions.tsv` file. So such approach just serves the way to
provide an "aggregate" collection of metadata.  As such, we might then need to
define how it interacts with the inheritance principle, but defining that yet
TODO in general for .tsv files.

The `+` in the filename signals a composite index, following the
convention proposed in #2273 and alternative to freshly proposed #2402 inspired by a work on BEP036

- #2123

hence attn @bids-standard/bep036 .

Most of the changes are just straightforward interpolation of
`participants.tsv` and `sessions.tsv` files definitions.

One of the notable changes is to `meta/context.yaml` where we added
`dataset.sessions` (union of all session directories across subjects) to enable
session-level validation checks.  I think it is only reasonable given that we
did already included dataset level summaries for datatypes and modalities. But
it would require bids-validator to support it.  Alternative - is to drop it and
that extra check we added.

Ideally though we should figure out how to validate specific combinations of
sub/sessions and TODO was left for that.

An example `participant+sessions.tsv` with `body_weight` column for the
already `7t_trt` bids-examples dataset is at

- bids-standard/bids-examples#556

where, if you also look into original `participants.tsv`, makes it a
little obvious that duplication of all entries across all sessions would be
dubious.

- implements a single first manifestation for #2273
- I think overall we can state that it closes #1020 which theoretically could
  have been closed with original introduction of _sessions.tsv files.

Co-Authored-By: Claude Code 2.1.113 / Claude Opus 4.6 <noreply@anthropic.com>
@codecov

codecov Bot commented Apr 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (6764039) to head (4f116b4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2403   +/-   ##
=======================================
  Coverage   83.07%   83.07%           
=======================================
  Files          22       22           
  Lines        1696     1696           
=======================================
  Hits         1409     1409           
  Misses        287      287           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yarikoptic yarikoptic added enhancement New feature or request schema Issues related to the YAML schema representation of the specification. Patch version release. phenotype labels Apr 20, 2026
@dmoracze

Copy link
Copy Markdown

Is the proposed participants+sessions.tsv the only place to store data that varies within participant across session?
For example, would a longitudinal dataset that collects demographics and surveys at each timepoint be required to put every item in this file? If so, this file would get out of hand quickly.

Or is participants+sessions.tsv a record of joint indices that can be used in other places, say phenotype/?

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

Thank you @dmoracze for your question! TL;DR: he full and exhaustive answer is "NO" and I added TODO to the original description to elaborate on that!

The quick extended answer was alluded to in the original description:

complementing the existing participants.tsv (participant-constant) and per-subject *_sessions.tsv files.

So we already have that mechanism and it could be used in cases where more appropriate.

And, although we are yet to actually improve/formalize inheritance principle for the application to .tsv files, and to this composite indexing in particular , but potentially such metadata could then be present in both top level participants+sessions.tsv and sub-.../sub-..._sessions.tsv file(s) (Hence -- TODO).

@dmoracze

dmoracze commented Apr 24, 2026

Copy link
Copy Markdown

Ah ha, yes that makes sense. I'm both trying to mesh this enhancement with our group's typical usecases and brainstorming a way forward for #2123.

We often curate datasets with multiple visits and data types and many times each visit does not contain all data types. Take, for example, two visits where only survey/demographic data is collected, say pre/post drug administration where MR scan sessions take place between the survey sessions. The current spec recommends aggregating these data into dataset/phenotype. If we have the same survey given at multiple sessions, I see the following options, none of them satisfactory:

  1. Storing all aggregated multi-session survey data within the the proposed participants+sessions.tsv. This can make the file very large and unwieldy.

  2. Do not aggregate the data and store within the corresponding subject directory, with the implication that the user will need to aggregate over many tsv files. There are two options here:

    • Store the multi-session survey data in sessions.tsv, but then IMO you get the same large and unwieldy files as in 1, except now there is one per subject
    • Store the multi-session survey data in /dataset/subject/session/phenotype, but currently the phenotype directory is only allowed at the top level

Perhaps this is an edge case in the field as a whole, but we curate many datasets like this. The other option is to allow some type of composite index in TSVs stored in dataset/phenotype, as described in #2123.

That is my current understanding of the situation, let me know if I'm misunderstanding your proposal.

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

There is always a compromise to hit. That is why there is flexibility + "redundancy" (eg. via IP + summarization) available in BIDS to cater a wider range of use cases -- one size fits all might not work.

in "unwieldy", in particular for sub-.../ses-..._sessions.tsv file you mean -- too wide (too many columns)?

Store the multi-session survey data in /dataset/subject/session/phenotype, but currently the phenotype directory is only allowed at the top level

thanks for bringing it up too! I recall having some discussions or "brainstorming" toward generalization and allowing for "symmetry" of datatype folders (could also apply to stimuli/ as e.g. initiated in good old #751 etc) . In part it relates to #1809 but I would not go there ATM ;) I am yet to get a fresh look at #2123 before our meetup with @ericearl next week, so I will not comment on that relation yet.

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

re unwildly again, since I am on a mission (I feel like) of promotion -- have you seen/tried https://www.visidata.org ?

@yarikoptic yarikoptic marked this pull request as draft May 7, 2026 17:43
@yarikoptic

Copy link
Copy Markdown
Collaborator Author

since I do not think it is something which should be merged without more of extended discussion -- moved to draft "for protection"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request phenotype schema Issues related to the YAML schema representation of the specification. Patch version release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Age at session

2 participants